Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokio 1.0 compatibility #267

Closed
wants to merge 10 commits into from
Closed

Tokio 1.0 compatibility #267

wants to merge 10 commits into from

Conversation

rupansh
Copy link

@rupansh rupansh commented Nov 12, 2020

Consider using another async tls implementation!

@saghm
Copy link
Contributor

saghm commented Nov 12, 2020

Hello! Thanks for the pull request. Updating to tokio 0.3 will be a little tricky because we don't want to break support for applications already using 0.2, so our team will need to do some planning for how to approach this before merging anything updating our support to 0.3. Once we have everything figured out, we'll come back and respond with more info!

@Alexander-L-G
Copy link

Tracked by https://jira.mongodb.org/browse/RUST-593

@Alexander-L-G Alexander-L-G added the tracked-in-jira Ticket filed in Mongo's Jira system label Nov 16, 2020
@rupansh
Copy link
Author

rupansh commented Dec 13, 2020

@saghm tokio 1.0 is planned to be released in a few days(end of december). Should we skip 0.3?

@saghm
Copy link
Contributor

saghm commented Dec 14, 2020

Yeah, that probably makes sense. I'll discuss with the team and circle back and let you know once we make a decision!

@rupansh rupansh changed the title Tokio 0.3 compatibility Tokio 1.0 compatibility Dec 24, 2020
@rupansh
Copy link
Author

rupansh commented Dec 24, 2020

@saghm tokio 1.0 works with no code changes from 0.3 🎉

@jens1o
Copy link

jens1o commented Jan 7, 2021

What is the current status here? Is there anything I could help with?

@rupansh rupansh force-pushed the master branch 2 times, most recently from 206bcff to db1139a Compare January 7, 2021 13:29
@saghm
Copy link
Contributor

saghm commented Jan 7, 2021

What is the current status here? Is there anything I could help with?

Hello! Right now, we're in the process of discussing how to handle the transition with regards to backwards compatibility. If we just switch the driver over to using tokio 1.0 by default rather than adding it as a separate option, this has the potential to break existing code, so we want to make sure we work through all the ramifications before deciding how to proceed. Figuring this out is definitely a high priority for us, and we'll report back here with our plans once we figure them out!

@seanpianka
Copy link
Contributor

@rupansh Do you want to pin to specifically 1.0.0? Can you pin to 1.0.* or 1.*?

@alexander-jackson
Copy link

@rupansh Do you want to pin to specifically 1.0.0? Can you pin to 1.0.* or 1.*?

Cargo will interpret 1.0.0 as ^1.0.0 as it has no operators, so they're the same thing.

@seanpianka
Copy link
Contributor

seanpianka commented Jan 10, 2021

@rupansh Is it possible to upgrade trust-dns-proto and trust-dns-resolver to 0.20.0? It would be helpful for those concerned about binary sizes to remove the multiple versions of the Tokio crate dependency.

Likewise with reqwest, it can be updated to 0.11, which will upgrade the tokio/hyper dependency there, as well.

@alexander-jackson
Copy link

@seanpianka I made a fork of @rupansh's repository a few days ago and made those changes, for the moment you can find it here. It just upgrades the 3 you mentioned (reqwest, trust-dns-proto and trust-dns-resolver) to their latest Tokio 1.0 versions so there's no older versions in the binary. I'm currently using it as a Git dependency for a project I'm working on.

It's probably worth someone more familiar taking a look at the commit where I added support for this, as it's my first time working with the 2 trust-dns-* libraries. In any case, it passes the same tests as the master branch here.

I wasn't sure of Git etiquette, so instead of opening another PR from my fork to this repository with commits from both me and @rupansh, I just opened a PR into their repository with the additional changes. If it's preferable, I can just close that and open a new PR directly here.

@FreddieBrown
Copy link

FreddieBrown commented Jan 12, 2021

I used the fork @alexander-jackson wrote on a joint project we are doing which uses tokio 1.0 and connects to Atlas. We found it times out when trying to access the Atlas instance (simple read) when doing Server selection saying there are no available primary instances. I have checked Atlas and a previous build of our project and this isn't true so its something to do with the updated libraries.

@seanpianka
Copy link
Contributor

seanpianka commented Jan 12, 2021

I can confirm that the updated fork with Tokio 1.0 has issues with sporadic timeouts.

After rebasing rupansh's fork with the upstream master commits (2 new commits, as of right now), I'm now seeing compilation failures related to the Tokio sync receiver/sender:

   Compiling mongodb v1.1.1 (https://github.com/seanpianka/mongo-rust-driver.git#a9e51531)
error[E0599]: no method named `recv` found for struct `tokio::sync::watch::Receiver<_>` in the current scope
  --> /Users/sean/.cargo/git/checkouts/mongo-rust-driver-0fda0d65eeba4433/a9e5153/src/cmap/status.rs:22:37
   |
22 |     RUNTIME.block_in_place(receiver.recv());
   |                                     ^^^^ method not found in `tokio::sync::watch::Receiver<_>`

error[E0599]: no method named `recv` found for struct `tokio::sync::watch::Receiver<_>` in the current scope
  --> /Users/sean/.cargo/git/checkouts/mongo-rust-driver-0fda0d65eeba4433/a9e5153/src/cmap/status.rs:22:37
   |
22 |     RUNTIME.block_in_place(receiver.recv());
   |                                     ^^^^ method not found in `tokio::sync::watch::Receiver<_>`

error[E0599]: no method named `broadcast` found for struct `tokio::sync::watch::Sender<PoolStatus>` in the current scope
  --> /Users/sean/.cargo/git/checkouts/mongo-rust-driver-0fda0d65eeba4433/a9e5153/src/cmap/status.rs:43:56
   |
43 |         let _: std::result::Result<_, _> = self.sender.broadcast(new_status);
   |                                                        ^^^^^^^^^ method not found in `tokio::sync::watch::Sender<PoolStatus>`

error[E0599]: no method named `broadcast` found for struct `tokio::sync::watch::Sender<PoolStatus>` in the current scope
  --> /Users/sean/.cargo/git/checkouts/mongo-rust-driver-0fda0d65eeba4433/a9e5153/src/cmap/status.rs:43:56
   |
43 |         let _: std::result::Result<_, _> = self.sender.broadcast(new_status);
   |                                                        ^^^^^^^^^ method not found in `tokio::sync::watch::Sender<PoolStatus>`

error: aborting due to 2 previous errors

error: aborting due to 2 previous errors

@seanpianka
Copy link
Contributor

We need to review the breaking changes from 0.2 to to the 1.0 (0.3 beta of 1.0) API: https://github.com/tokio-rs/tokio/releases/tag/tokio-0.3.0

@alexander-jackson
Copy link

It's probably also worth checking the changes to trust-dns-proto and trust-dns-resolver if there are timeout issues: https://github.com/bluejekyll/trust-dns/releases/tag/v0.20.0

@glebpom
Copy link

glebpom commented Jan 12, 2021

I'm observing the same problem with Atlas connection. Debugging shows that TLS handshake just never finishes. I could fix it by reverting the code of try_connect (tokio variant). Unfortuately, tokio doesn't support set_keepalive in new release (tokio-rs/tokio#3274).

@seanpianka
Copy link
Contributor

@glebpom Could you share your debugging steps? I would like to contribute an extra data point to the discussion :)

@glebpom
Copy link

glebpom commented Jan 13, 2021

@seanpianka I just used println! to check where the code stucks, while trying to perform the query. I ending up at this line, which never finished.
I can now confirm that this patch works.

@seanpianka
Copy link
Contributor

seanpianka commented Jan 13, 2021

That patch is still using 1.0 API? Could you explain the fix? Trying to learn about Tokio and the interaction in this project. It seems like you have a method for connection timeout using new API, which is several steps to setup compared to 0.2 API.

@glebpom
Copy link

glebpom commented Jan 13, 2021

Yes, it's based on current PR. My goal was just to make it work, not figuring out all the details (at least for now). My observations are as follows:

  1. Code in this PR tries to use the common method to establish TCP connection for both tokio and async-std.
  2. Instead of using native runtime implementation of TCP connection (as it was before), it relies on socket2 crate. Probably the reason for that was to enable TCP keepalives. The support is currently dropped in the latest tokio version.
  3. There is a serious issue In this code - it uses synchronous TCP connect which will block the executor. In opposite to that, the patched version use the same async implementation as before and doesn't introduce blocking
  4. UPDATE: the reason was probably in not manually setting the socket into non-blocking mode. Anyway, due to 3. it should be refactored.

src/runtime/stream.rs can be improved maybe

Signed-off-by: rupansh-void <rupanshsekar@hotmail.com>
I didn't sleep 2 nights because of this 🤡
Signed-off-by: rupansh-arch <rupanshsekar@hotmail.com>
@seanpianka
Copy link
Contributor

Thank you for providing a quick patch to this issue, @glebpom. Applying your fix has resolved any observable timeout issues in my code.

avoid unnecessary conversion in tokio runtime

Signed-off-by: rupansh-arch <rupanshsekar@hotmail.com>
@rupansh
Copy link
Author

rupansh commented Jan 13, 2021

Yes, it's based on current PR. My goal was just to make it work, not figuring out all the details (at least for now). My observations are as follows:

1. Code in this PR tries to use the common method to establish TCP connection for both tokio and async-std.

2. Instead of using native runtime implementation of TCP connection (as it was before), it relies on `socket2` crate. Probably the reason for that was to enable TCP keepalives. The support is currently dropped in the latest tokio version.

3. There is a serious issue In this code - it uses synchronous TCP `connect` which will block the executor. In opposite to that, the patched version use the same async implementation as before and doesn't introduce blocking

4. UPDATE: the reason was probably in not manually setting the socket into non-blocking mode. Anyway, due to 3. it should be refactored.

"3." has been used for async-std, even before this PR which is why I assumed it would be fine with tokio as well. I have set nonblocking to true regardless. Let me know if it makes a difference.

I can confirm that the updated fork with Tokio 1.0 has issues with sporadic timeouts.

After rebasing rupansh's fork with the upstream master commits (2 new commits, as of right now), I'm now seeing compilation failures related to the Tokio sync receiver/sender:

   Compiling mongodb v1.1.1 (https://github.com/seanpianka/mongo-rust-driver.git#a9e51531)
error[E0599]: no method named `recv` found for struct `tokio::sync::watch::Receiver<_>` in the current scope
  --> /Users/sean/.cargo/git/checkouts/mongo-rust-driver-0fda0d65eeba4433/a9e5153/src/cmap/status.rs:22:37
   |
22 |     RUNTIME.block_in_place(receiver.recv());
   |                                     ^^^^ method not found in `tokio::sync::watch::Receiver<_>`

error[E0599]: no method named `recv` found for struct `tokio::sync::watch::Receiver<_>` in the current scope
  --> /Users/sean/.cargo/git/checkouts/mongo-rust-driver-0fda0d65eeba4433/a9e5153/src/cmap/status.rs:22:37
   |
22 |     RUNTIME.block_in_place(receiver.recv());
   |                                     ^^^^ method not found in `tokio::sync::watch::Receiver<_>`

error[E0599]: no method named `broadcast` found for struct `tokio::sync::watch::Sender<PoolStatus>` in the current scope
  --> /Users/sean/.cargo/git/checkouts/mongo-rust-driver-0fda0d65eeba4433/a9e5153/src/cmap/status.rs:43:56
   |
43 |         let _: std::result::Result<_, _> = self.sender.broadcast(new_status);
   |                                                        ^^^^^^^^^ method not found in `tokio::sync::watch::Sender<PoolStatus>`

error[E0599]: no method named `broadcast` found for struct `tokio::sync::watch::Sender<PoolStatus>` in the current scope
  --> /Users/sean/.cargo/git/checkouts/mongo-rust-driver-0fda0d65eeba4433/a9e5153/src/cmap/status.rs:43:56
   |
43 |         let _: std::result::Result<_, _> = self.sender.broadcast(new_status);
   |                                                        ^^^^^^^^^ method not found in `tokio::sync::watch::Sender<PoolStatus>`

error: aborting due to 2 previous errors

error: aborting due to 2 previous errors

Fixed

I also had to nuke @alexander-jackson 's changes because they broke async-std. I didn't notice it because i hadn't read it properly. There needs to be some way in trust-dns to use async-std's recv_from. I can think of storing the futures with Pin<Box> but it needs mutable reference and poll_recv_from only expects a reference.
As such we won't be using trust-dns 0.20.0 until a workaround is found, probably.

Also there are probably workloads that depend on keep_alive without them realizing so use @glebpom 's patch which connects completely asynchronously with caution.

@glebpom
Copy link

glebpom commented Jan 13, 2021

It seems like both runtimes don't support TCP keepalives directly. At the same time using synchronous connect may block executor completely, and break the whole async behavior.
What we can do, is to establish connection asynchronously, convert it to std::net::TcpStream, enable keepalives, and convert back to runtime's implementation of TcpStream. For tokio it is easily achievable through into_std. I believe async-std should have similar options.

@rupansh
Copy link
Author

rupansh commented Jan 13, 2021

It seems like both runtimes don't support TCP keepalives directly. At the same time using synchronous connect may block executor completely, and break the whole async behavior.
What we can do, is to establish connection asynchronously, convert it to std::net::TcpStream, enable keepalives, and convert back to runtime's implementation of TcpStream. For tokio it is easily achievable through into_std. I believe async-std should have similar options.

Or we can use RUNTIME.spawn_blocking. Either ways I need to know if the latest change made any difference, we can proceed accordingly.

@glebpom
Copy link

glebpom commented Jan 13, 2021

@rupansh I still see timeout with the latest version

async-std automatically detects blocking code. No need to use it there

Signed-off-by: rupansh-arch <rupanshsekar@hotmail.com>
@rupansh
Copy link
Author

rupansh commented Jan 14, 2021

@rupansh I still see timeout with the latest version

https://github.com/rupansh/mongo-rust-driver/tree/tcpstream_timeout
Can you check it?
Also it would be better if we could continue our discussion here rupansh#2

@Darksonn
Copy link

Be aware that Tokio provides a TcpSocket type for asynchronously connecting a socket with special options set through e.g. socket2.

@aramperes
Copy link

Looks like this got superseded by #283 and pending the 2.0.0 release?

@patrickfreed
Copy link
Contributor

Just as a heads up: we just released version 2.0.0-alpha, which has support for tokio 1.0. Please check it out and let us know of any issues you encounter. The API will be unstable for the time being as we make minor breakages pre-2.0. Thank you for your continued patience while we work towards a stable release supporting tokio 1.0!

I'm going to close out this PR now, please use our Jira tracker to continue the discussion or submit any bug reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tracked-in-jira Ticket filed in Mongo's Jira system
Projects
None yet
Development

Successfully merging this pull request may close these issues.